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A  DATA  ANALYSIS  OF  SUCCESS  IN  OCS, 

THE  USE  OF  ASVAB  WAIVERS, 

AND  RACE 

R.  R.  Read 
L.  R.  Whitaker 

Abstract 

Success  in  Officers  Candidate  School  (OCS)  occurs  at  the  same  rate 
regardless  of  whether  the  candidates  received  a  mental  aptitude 
qualification  waiver  based  upon  their  score  on  the  electronics  portion  of 
the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB).  However, 
these  rates  do  change  with  race  and  time;  and  the  result  is  an  apparent 
contradiction  because  the  macro  rates  (those  rates  computed  overall 
without  discriminating  race  and  time)  exhibit  different  success  rates 
depending  upon  the  presence  of  a  waiver  or  not.  The  data  are  studied  to 
expose  the  contradiction  and  develop  sharper  models. 

I.    Introduction 

The  accession  of  officers  into  the  Marine  Corps  includes  using  one  of  three 
mental  aptitude  test  scores:  Armed  Services  Vocational  Aptitude  Battery 
Electronics  Repair  Composite  (called  ASVAB  herein),  the  Scholastic  Aptitude 
Test  (SAT),  and  the  American  College  Test  (ACT).  Historically,  55%  of  the 
officers  entering  the  Corps  use  the  first  of  these  three,  and  the  qualification 
threshold  is  a  score  of  120.  But  a  candidate  can  receive  a  waiver  of  this  minimum 
provided  his  score  is  115  or  better.  The  paper  treats  only  those  using  the  ASVAB 
test. 

Based  on  data  collected  over  the  fiscal  years  1988  through  1992  and  broken 
out  by  race,  personnel  at  the  Manpower  Analysis  (MA)  Branch  at  Marine  Corps 
Headquarters  noticed  that  success  at  the  Officer  Candidate  School  (OCS)  appears 


to  be  independent  of  whether  an  officer  has  received  an  ASVAB  waiver. 
Specifically,  there  are  four  racial  groups,  Caucasian,  Black,  Hispanic,  and  Other. 
The  Other  group  consists  of  American  Indian,  Alaskan  Native,  Asian,  and  Pacific 
Islander  in  the  large.  When  collapsed  over  time,  the  four  2x2  contingency  table 
tests  for  independence  yield  the  chi  square  test  statistics  .6678,  2.841,  .7983,  .5767 
for  the  respective  races,  each  with  one  degree  of  freedom.  None  of  these  are 
significant.  However,  when  the  data  are  further  collapsed  over  race  and  a  single 
test  for  independence  is  performed,  then  the  relationship  is  highly  significant. 
This  latter  2x2  table  appears  in  Table  1.  The  chi  square  statistic  is  11.87  and  the 
p-value  is  0.00057. 

On  the  surface,  it  appears  that  we  have  contradictory  results.  On  the  one 
hand,  OCS  candidate  success  and  the  presence  of  a  waiver  are  independent  when 
Caucasians,  Blacks,  Hispanics  and  Others  are  considered  separately.  On  the  other 
hand,  there  is  dependence  in  the  collapsed  table  when  race  is  not  accounted  for, 
with  strong  evidence  that  the  chance  of  success  without  a  waiver  is  greater  than 
that  with  a  waiver. 

Table  1.  Macro  Analysis  of  Success  and  Waiver 


Waiver            No  Waiver          Total 

Success 
Failure 

754                      7449 
299                     2303 

8203 
2602 

Total 

1053                    9752 

10805 

A  short  answer  to  the  contradiction  can  be  obtained  through  an  interpretation  of 
the  two  success  rates.  They  are  not  significantly  different  for  waiver  and  non- 
waiver within  racial  groups.  But  the  rates  change  sharply  from  group  to  group. 
Indeed,  the  use  of  the  waiver  varies  markedly  from  group  to  group  and,  to  a 


lesser  extent,  from  year  to  year.  This  is  surely  related  to  the  implementation  of 
the  Marine  Corps  Affirmative  Action  Plan. 

This  paper  contains  an  explanation  of  the  contradiction  and  attention  is 
drawn  to  other  interesting  facets  as  well.  In  Section  II  the  raw  data  are  presented 
and  all  2  x  2  tables  of  success  /failure  by  waiver /non- waiver  are  studied  for  each 
year  /racial  group  pair.  Generally,  independence  is  tenable.  To  explain  the  non- 
independence,  the  full  data,  aggregated  over  years  and  with  race  as  a  factor,  are 
then  subjected  to  a  log-linear  analysis  in  Section  HI.  In  Section  IV,  we  fit  models 
with  time  as  a  factor  including  the  use  of  the  waiver  by  year  and  race.  These 
models  could  be  valuable  because  an  ill-advised  long-term  overuse  of  the  waiver 
could  lead  to  inequities  in  the  future  advancement  to  higher  rank  [3]. 

Categorical  data  is  prevalent  in  military  OR.  Thus,  we  take  a  careful  look  at 
the  data  and  provide  details  that  would  normally  be  omitted  so  that  certain 
usage  may  be  illustrated.  In  particular,  in  the  next  section,  attention  is  drawn  to 
the  rather  interesting  effects  when  conditional  tests  are  used,  and  in  Section  III 
the  steps  for  fitting  a  loglinear  model  are  presented. 

The  factors  of  interest  are  success  or  failure  of  OCS  candidates  to  qualify  for 
the  OCS  program,  whether  the  candidate  used  an  ASVAB  (lower  mental 
category)  waiver,  fiscal  year,  and  race.  The  data  (see  Table  2)  consists  of  counts 

Dijkl 
where  i  -  1,2  indicates  success  or  failure,/  =  1, 2  indicates  presence  or  absence  of 
waivers,  k  =  1, . . .,  5  indicates  the  fiscal  year  FY88  to  FY92  and  /  =  1, . . .,  4  indicates 
race,  in  the  order  given  earlier. 


Success 

in 

OCS 


Table  2.  Frequency  Counts  by  Category 

Candidates  Qualifying  with  ASVAB  Waiver 


FY 

White 

Black 

Hispanic 

Other 

Total 

FY88 

100 

11 

10 

12 

133 

FY89 

142 

37 

12 

20 

211 

FY90 

102 

30 

20 

11 

163 

FY91 

77 

22 

14 

2 

115 

FY92 

70 

36 

22 

4 

132 

Total 

491 

136 

78 

49 

754 

FY 

White 

Black 

Hispanic 

Other 

Total 

FY88 

22 

8 

5 

1 

36 

Failure 

FY89 

30 

15 

11 

7 

63 

in 

FY90 

35 

16 

10 

3 

64 

OCS 

FY91 

21 

22 

6 

3 

52 

FY92 

45 

31 

8 

0 

84 

Total 

153 

92 

40 

14 

299 

Candidates  Qualifying 

without  ASVAB  Waiver 

FY            White 

Black 

Hispanic      Other         Total 

FY88 

1113 

48 

48               95 

1304 

Success 

FY89 

1533 

56 

80              111 

1780 

in 

FY90 

1263 

77 

76             109 

1525 

OCS 

FY91 

1013 

58 

78               39 

1188 

FY92 

1390 

87 

108               67 

1652 

Total 

6312 

326 

390             421 

7449 

FY 

White 

Black 

Hispanic 

Other 

Total 

FY88 

234 

14 

16 

31 

295 

ailure 

FY89 

323 

18 

22 

35 

398 

in 

FY90 

350 

50 

41 

38 

479 

OCS 

FY91 

430 

35 

38 

24 

527 

FY92 

481 

50 

48 

25 

604 

Total 

1818 

167 

165 

153 

2303 

II.  Individual  Contingency  Tables 

Suppose  the  full  data  are  broken  into  twenty  (5  years,  4  races)  2x2 
contingency  tables  and  subjected  to  individual  analyses.  It  is  instructive  to  apply 
the  most  often  used  procedures  to  each  and  gain  experience  in  their  use  and 
effect. 

Let  us  simplify  the  notation  and  let  w«  =  Djjki  be  the  counts  with  year  and  race 

held  fixed,  i  =  1,2  indicates  success  or  failure  in  OCS,  and  /  =  1,  2  indicates 

presence  or  absence  of  waiver,  respectively.  Under  independence  the  expected 

frequencies  are  estimated  by 

ihij  =  rii+n+j  I N  with  N  =  ££ n^ , 

and  the  plus  indicates  summation  over  the  replaced  subscript.  The  familiar 
Pearson  Chi  Square  and  Log  Likelihood  statistics  are  given  by 

2     2  2 

2     2 
G2  =  2^^nij\n(nij/mij) 

Each  is  asymptotically  distributed  as  chi  square  with  one  degree  of  freedom. 

The  use  of  the  odds  ratio  is  also  popular  especially  in  2  x  2  tables.  It 
summarizes  the  strength  and  type  of  dependence  between  the  two  categories. 
Letting  {rijy}  be  the  cell  probabilities,  the  odds  ratio  is  defined  by 

0=nlln22/nl2n2l 
and,  in  our  context,  represents  the  odds  of  OCS  success  using  waivers  divided  by 
the  odds  of  success  without  the  use  of  waivers.  The  null  value  0=1  represents 
"no  effect"  of  waivers,  or  independence.  The  maximum  likelihood  estimator  of  6 
is 

0  =  nun22  /  n\2n2\- 


The  null  distribution  of  ln(#)  is  well  approximated  by  the  normal  distribution  [1] 

-  l2      2    2 
with  the  variance  estimated  by  of  In  0  J     =  ]T,  £  1  /  nu . 


_,2 
1    1 

Thus,  a  third  test  statistic  is 

il/2 


Z  =  ln(^)/[XIl/^] 
Concern  for  the  use  of  asymptotics  has  led  the  authors  to  consider  Fisher's 
Exact  Test  as  well,  [1,  p60ff].  Under  the  null  hypothesis  of  independence,  an  exact 
distribution  that  is  free  of  any  unknown  parameters  results  from  conditioning  on 
the  totals  in  both  margins.  The  result  is  a  hypergeometric  distribution 

\n\2)l  \nU; 

Since  the  totals  in  the  margins  are  given,  only  n\\  need  be  considered  as  variable. 
Its  range  is 

max(0,«+i  +  «1+  -  N)  <  ti\\  <  min(«+1,w1+). 

Exact  two-sided  p-values  are  obtained  by  summing  probabilities  of  tables  that 
are  at  least  as  rare  under  the  null  hypothesis  as  the  observed  table.  Only  those 
tables  that  have  hypergeometric  probabilities  at  least  as  small  as  the  observed 
configuration  are  used  [2]. 

The  results  of  the  four  procedures  are  given  in  Table  3,  which  contains  the 
values  of  total  populations,  N;  the  odds  ratios,  6;  ln(0J;  the  standard  deviation  of 
Info);  and  the  four  p-values.  Within  cells  the  racial  levels  are  Caucasian,  Black, 
Hispanic,  Other,  respectively.  There  are  some  blank  entries  for  the  last  case 
because  «2i  =  0. 

Perhaps  the  first  thing  to  notice  is  the  agreement  of  p-values  for  the  three 
asymptotic  procedures.  Only  for  the  smaller  values  of  N  do  they  show  much 
separation.  On  the  other  hand,  the  p-values  for  Fisher's  Exact  Test  generally  tend 


to  be  higher.  The  main  reason  for  this  is  the  conditioning  on  both  margin  totals. 
Such  is  not  the  case  in  the  other  procedures.  In  the  former  case,  the  nuisance 
parameters  are  eliminated  while  in  the  latter  three  procedures  they  are  estimated. 
The  differences  in  p-values  do  not  lead  to  conflicting  conclusions,  however. 
Two  cases  of  the  twenty  are  significant:  Hispanics  '89  and  Caucasians  '92.  In  both 
of  these  cases  the  odds  for  success  are  smaller  if  waivers  are  used.  The  opposite  is 
true  for  Caucasians  '91,  a  case  that  might  be  controversial  as  p  ~  .08. 

Table  3.  Two-Sided  p-values 


N 

d 

In  6 

<r(ln0) 

Fisher 

Z 

X2 

G2 

FY88 

Cauc. 

1469 

.956 

-.045 

.246 

.804 

.854 

.854 

.854 

Black 

81 

.401 

-.914 

.555 

.139 

.100 

.094 

.104 

Hisp. 

79 

.667 

-.405 

.619 

.527 

.513 

.511 

.518 

Other 

139 

3.916 

1.365 

1.061 

.298 

.198 

.168 

.126 

FY89 

Cauc. 

2028 

.997 

-.003 

.210 

1.000 

.990 

.990 

.990 

Black 

126 

.793 

-.232 

.409 

.681 

.570 

.570 

.571 

Hisp. 

125 

.300 

-1.204 

.482 

.017 

.012 

.010 

.014 

Other 

173 

.901 

-.104 

.480 

.810 

.828 

.828 

.829 

FY90 

Cauc. 

1750 

.808 

-.213 

.205 

.285 

.297 

.296 

.304 

Black 

173 

1.218 

.197 

.359 

.723 

.583 

.583 

.582 

Hisp. 

147 

1.079 

.076 

.433 

1.000 

.861 

.861 

.860 

Other 

161 

1.278 

.245 

.678 

1.000 

.717 

.717 

.712 

FY91 

Cauc. 

1541 

1.556 

.442 

.253 

.085 

.080 

.078 

.070 

Black 

137 

.603 

-.506 

.370 

.196 

.172 

.170 

.172 

Hisp. 

136 

1.137 

.128 

.527 

1.000 

.808 

.808 

.807 

Other 

68 

.410 

-.892 

.949 

.379 

.348 

.335 

.342 

FY92 

Cauc. 

1986 

.538 

-.620 

.198 

.002 

.002 

.002 

.002 

Black 

204 

.667 

-.405 

.303 

.223 

.181 

.180 

.182 

Hisp. 

186 

1.222 

.200 

.448 

.828 

.654 

.654 

.651 

Other 

96 

.570 

.225 

.116 

III.      General  Models 

The  four  factors;  success/failure,  waiver/no  waiver,  year  (1,  ...,  5),  and  race 
(1,  ...,  4);  are  denoted  as  A,  B,  C,  D,  respectively.  Since  the  total  number  of  OCS 
candidates  is  not  fixed,  the  data  Diju  will  be  assumed  to  be  generated  from  an 
independent  Poisson  sampling  scheme,  i.e.,  Djjki  are  independent  Poisson 
random  variables  with  respective  parameters  (myjtf)  where  mijki  =  E[Diju1.  To 
interpret  the  results  given  in  the  introduction  we  first  fit  a  loglinear  model  to  the 
counts  collapsed  over  years,  i.e.,  to 

Dij+l  =  Lk=\DiJU- 
The  saturated  loglinear  model  parameterizes  m,\+/  =  E |D«+j  I  as 

\nmij+l  =n  +  l?+tf  +  tf  +  tfB  +  A-/D  +  4D  +  lfD, 
1  =  1,2  7  =  1,2  1  =  1,..  .,4, 

where  the  A's  are  the  effects  and  interaction  terms  corresponding  to  the  variables 
A,  B,  D.  Using  standard  notation  [1],  this  saturated  model  can  be  represented  as 
[ABD],  i.e.,  the  third  order  interaction  term  ABD  and  all  lower  order  terms  made 
up  of  subsets  of  the  variables  A,  B,  and  D  are  included  in  the  model.  We  begin  by 
fitting  the  model  with  all  two-way  interaction  terms  along  with  all  main  effects, 
i.e.,  the  model  [AB]  [AD]  [BD].  This  gives  a  likelihood  ratio  test  statistic  of  2.55 
with  3  degrees  of  freedom  and  a  p-value  of  .466.  This  model  does  fit  the  data.  To 
see  whether  a  more  parsimonious  model  can  be  fit  we  remove  two-way 
interaction  terms  one  at  a  time.  This  yields  the  model  [AD]  [BD].  The  overall 
likelihood  ratio  test  statistic  is  4.84  with  4  degrees  of  freedom  giving  an 
acceptable  p-value  of  .31.  To  see  whether  anything  has  been  lost  by  removing  the 
AB  interaction  term,  we  test  the  null  hypothesis  [AD]  [BD]  versus  the  alternative 
[AB]  [AD]  [BD].  The  test  statistic  1.99  with  1  degree  of  freedom  has  a  p-value  of 


8 


.256.  There  is  not  enough  evidence  to  indicate  that  the  AB  term  should  be 
included.  Further,  deleting  terms  from  the  [AD]  [BD]  model  yields  models  with 
unacceptable  fits,  i.e.,  those  with  likelihood  ratio  test  statistics  having  p-values 
less  than  .05.  Finally,  the  standardized  residuals  for  the  [AD]  [BD]  model  range 
from  -.843  to  1.090.  Thus,  the  model  [AD]  [BD]  is  selected  and  fits  the  data 
(collapsed  over  years)  reasonably  well. 

The  question  now  becomes,  can  this  model  account  for  the  results  that 
motivated  the  study.  The  probabilistic  interpretation  of  the  model  [AD]  [BD]  is 
that  conditional  on  the  levels  of  factor  D  (race),  the  variables  A  and  B  are  inde- 
pendent. To  see  this  note  that  the  joint  probability  mass  function  (pmf)  of  the 
variables  A,  B,  C,  D  is 

t,kl     m++++' 

for  /'  =  1, 2;  /  =  1, 2;  k  =  1, ...,  5;  and  /  =  1, ...,  4.  The  model  [AB]  [BD]  fitted  to  the 

data  collapsed  over  years  corresponds  to 

lnm^+z  =  \l  +  X?  +  tf  +  X?  +  X$D  +  X?jiD.  (2.1) 

Thus  the  conditional  pmf  of  A  given  that  B  is  at  level ;  and  D  is  at  level  /  can  be 
found  from  this  model  to  be 

p,  -3tL 


=  ^exp{n  +  X?+X?  +  Xr} 


(2.2) 

Since  the  right  hand  side  of  (2.2)  is  not  a  function  of;',  we  see  that  the  conditional 
pmf  of  A  given  B,  D  is  the  same  as  the  conditional  pmf  of  A  given  D.  Thus  given 
D,  the  factors  A  and  B  are  independent. 


However,  A  and  B  are  not  independent  by  themselves  alone.  The  marginal 

probabilities  of  these  two  factors  can  be  developed  from  the  model  (2.1)  by 

summing 

exp{M  +  ^}SIexp{Af  +  A?  +  Xff  +  if) 

and 

expj/z  +  *?}XXexp{jtf  +  X?  +  A<JD  +  *JP} 

and  forming  the  appropriate  normalizations.  The  joint  probability  is  not  the 
product  of  these  probabilities.  Thus  the  model  supports  the  observation  made 
earlier  that  success  of  the  OCS  candidate  is  not  independent  of  whether  the 
ASVAB  waiver  has  been  used  for  entry.  These  two  variables  are  independent, 
however,  when  broken  out  by  race. 

The  following  probabilities  help  interpret  the  dependence  between  A  and  B. 
The  probabilities  of  success  given  race  are  estimated  to  be  .78,  .64,  .70,  .74  for 
Caucasians,  Blacks,  Hispanics  and  Others,  respectively.  (The  empirical  rates  and 
the  modeled  rates  are  the  same  to  two  decimal  places.)  The  proportions  of 
candidates  in  each  race  which  possess  a  waiver  are  .07,  .32,  .18,  .10,  and  the 
proportions  of  candidates  who  don't  possess  a  waiver  in  each  race  are  the 
complementary  values,  .93,  .68,  .82,  .90.  The  greatest  proportion  of  candidates 
who  don't  possess  a  waiver  are  Caucasians  (93%),  with  a  good  chance  of  success 
(78%).  However,  candidates  that  do  utilize  the  waiver  are  divided  primarily 
between  Blacks  (32%)  and  Hispanics  (18%).  Because  the  probability  of  success  for 
these  two  races  differ  (67%)  and  (70%)  respectively,  we  see  that  the  overall 
probability  of  success  with  a  waiver  is  lower  than  without  a  waiver.  Also,  the 
four  success  rates  decrease  monotonically  as  the  four  waiver  use  rates  increase. 
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IV.      Temporal  Analysis 

The  above  analysis  responds  to  the  question  posed  in  the  introduction.  But  it 
is  also  of  interest  to  consider  the  other  factor,  C,  the  fiscal  year.  If  including  the 
variable  race  sheds  light  on  the  dependence  between  having  a  waiver  and 
success  of  the  OCS  candidate,  perhaps  considering  this  fourth  variable  will  add 
to  an  understanding  of  this  data  set. 

Perhaps  the  most  direct  way  to  proceed  is  to  consider  the  most  general  four 
factor  model  that  reflects  independence  of  factors  A  and  B.  In  the  notation 
established  this  would  be  [ACD]  [BCD].  All  interactions  involving  A  and  B  are 
zero.  Doing  so  produces  a  likelihood  ratio  p-value  of  .049.  This  is  rather  small  for 
our  tastes.  Study  of  the  residuals  reveals  two  outlier  cells:  unsuccessful  Hispanics 
with  a  waiver  in  FY89  and  unsuccessful  Caucasians  with  a  waiver  in  FY92.  These 
two  cells  belong  to  the  same  cases  that  exhibited  low  p-values  in  Table  3. 

It  appears  that  the  loglinear  modeling  system  must  provide  for  some  AB 
interactive  terms.  Accordingly  we  apply  the  strategy  which  fits  the  models  with 
all  three  way  and  lower  order  terms;  all  two  way  and  lower  order  terms;  and  all 
one  way  terms.  Then  the  overall  model  with  the  fewest  terms  and  an  acceptable 
overall  fit  is  used  as  a  starting  point  for  further  deletion  of  terms  within  the 
chosen  set.  The  first  model  fit  was  the  one  with  all  three  way  interactions.  This 
gives  an  overall  fit  with  a  p-value  of  .0387.  However,  as  terms  are  deleted  the 
p-value  increases  and  the  model  [ABC]  [BCD]  [ACD]  gives  a  slightly  higher 
p-value  for  overall  fit  of  .0657.  Further  deletion  of  terms  leads  to  the  model  [ABC] 
[BCD]  [AD]  with  p-value  .22. 

The  fact  that  the  deletion  of  additional  terms  appears  to  improve  the  fit  can  be 
explained  by  noting  the  increase  in  the  degrees  of  freedom.  For  the  model  with 
all  three  way  interaction  terms,  the  likelihood  ratio  test  statistic  is  21.95  with  12 
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degrees  of  freedom,  deleting  the  ABD  term  increases  degrees  of  freedom  to  15 
and  the  test  statistic  to  24.01  and  the  deletion  of  the  ABD  term  increases  the 
degrees  of  freedom  to  19  and  the  test  statistic  to  29.548.  Therefore  deleting  terms 
does  not  increase  the  test  statistic  very  much  compared  to  the  gain  in  degrees  of 
freedom. 

Deleting  either  the  ABC  or  BCD  terms  from  the  [AD]  [ABC]  [BCD]  model 
results  in  models  with  much  lower  p-values  for  overall  goodness  of  fit  and 
standardized  residuals  that  are  of  much  larger  magnitude  than  those  of  the  [AD] 
[ABC]  [BCD]  model.  Since  the  standardized  residuals  for  this  model  range 
between  -1.78  to  1.81,  this  model  appears  to  give  an  adequate  fit.  In  passing,  we 
note  that  all  AB  interactive  terms  are  modest  in  size. 

The  estimated  probabilities  of  success  given  race,  waiver  status  and  fiscal  year 
( ViUkl  J  are  pl°tted  against  year  (k)  in  Figures  1  and  2.  There  is  a  general  decrease 
in  the  probability  of  success  over  time  in  all  four  racial  groups  regardless  of 
waiver  status.  In  fact,  when  the  model  [AD]  [BD]  is  fit  to  years  separately,  only 
1992  fails  to  fit  with  a  p-value  =  .01.  It  appears  that  for  the  first  four  years  this 
trend  is  reasonably  well  modeled  as  independent  of  waiver  status.  The  presence 
of  the  ABC  interaction  term  in  the  temporal  model  is  a  consequence  of  changes  in 
1992,  specifically  the  outlier  cell  cited  earlier. 

The  presence  of  the  BCD  interaction  term  can  be  explained  by  changes  in  the 
number  of  waivers  utilized  over  time.  To  examine  this,  we  fit  a  logistic  regression 
model  where  the  response  variable  is  one  or  zero  according  to  whether  an 
individual  received  a  waiver  or  not,  and  the  explanatory  variables  are  years  and 
race.  Since  years  is  in  fact  an  ordinal  variable,  it  was  scored  as  the  integers  1  to  5 
for  the  years  1988  to  1992.  This  saves  degrees  of  freedom  and  helps  detect 
monotonic  trends. 
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Figure  1 
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The  model  with  a  cubic  term  in  years  gives  an  adequate  fit  to  the  data 
(p-value  =  .112).  This  model  fits  the  data  somewhat  better  than  the  model  that  fits 
the  year  as  a  categorical  variable. 

The  fitted  values  are  the  estimates  of  the  conditional  probabilities  that  an 
officer  receives  a  waiver  given  year  and  race.  These  are  plotted  by  race  in 
Figure  3.  From  this  plot  it  can  be  seen  that  except  for  1989  there  has  been  a 
general  decline  in  the  proportion  of  waivers  awarded  for  each  race. 


Figure  3 
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In  conclusion,  we  have  accounted  for  the  nature  of  the  paradox  stated  in  the 
introduction  by  the  use  of  loglinear  analysis  after  collapsing  the  data  over  time. 
The  odds  ratio  analysis  served  to  support  the  independence  vs.  waiver 
hypothesis  at  a  micro-level,  and  deeper  loglinear  modeling  can  be  used  to 
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quantify  the  changes  in  probabilities  as  functions  of  race  and  time.  The  final 
analysis  collapses  the  data  over  OCS  success  or  failure  and  treats  the  use  of  the 
waiver.  It  appears  to  be  diminishing  in  time  but  there  are  some  rather  prominent 
separations  by  race.  Some  additional  study  in  these  areas  can  be  found  in  [3]. 
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APPENDIX  A 

Algorithm  to  produce  p-values  for  the  hypergeometric  distribution. 

Let  us  view  our  basic  2x2  table  as 


a 

b 

S 

c 

d 

F 

"l 

n2 

N 

In  the  context  of  the  report,  a  is  the  number  of  successful  candidates  among  the 
n\  that  used  waivers;  b  is  the  number  of  successful  candidates  among  the  n2  that 
did  not  use  waivers,  etc.  The  probabilistic  structure  used  is  a  conditional  one, 


fn,\fn2^ 


»1 

a 


P(a\a  +  b  =  S)  =  ^ 


kK 


(A.l) 


which  is  a  hypergeometric  probability  function.  For  the  present  purposes  it  is 
useful  to  describe  the  variable  range  constraints  rather  elaborately: 

max(0,  S-n2)  <a< min(S,  n\) 

max(0,  S-n\)  <b<  min(S,  n2) 

max(0,  F-n2)  <c<  min(F,  n\) 

max(0,  F-n\)  <d<  min(F,  n2) 
Let  us  analyze  the  computations.  Let  Pq  be  the  value  of  (A.l)  for  the  observed 
table.  The  p-value  is  the  sum  of  all  probabilities  (A.l)  which  are  less  than  or  equal 
to  ?o-  Let 

C  =  ni\n2\S\F\/Nl 
Then  (A.l)  can  be  expressed  as 


P  =  C- 


(A.l) 


alblcldl 

In  the  p-value  computation  the  value  of  C  is  fixed  and  only  the  other  factor  in 
(A. 2)  changes  as  the  summation  takes  place.  It  is  often  wise  to  use  logarithms  in 
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the  computation  because  the  factorials  can  get  quite  large.  Also  the  two-sided 
p-value  computation  is  managed  by  identifying  the  two  tails  of  the  distribution 
and  summing  their  contributions. 

Our  approach  is  to  first  identify  the  variable  (a,  b,  c,  d)  that  has  the  shortest 
range  in  the  specific  situation.  To  do  this  we  compute  the  empirical  odds  ratio 

ad 
6  =  Vc 

and  determine  the  case  6<\  or  d>\.  This  identifies  the  tail  that  contains  the 

experimental  result.  That  is,  we  view  the  testing  problem  as  Ho:  p\  =  p2  vs 

Hi:  pi  *  p2-  The  two  estimators  are 

pl  =  a/ni      and      p2=b/ri2- 

It  is  easily  seen  that  p\  <p2  is  equivalent  to  6<1;  and  the  opposite  case  with 

9  >  1.  Thus  if  6  <  1  we  choose  M  =  min(a,  d)  and  sum  the  hypergeometric  terms 

for  that  tail  of  the  distribution.  Of  course,  if  6  >  1  we  choose  M  =  min(fr,  c)  for  the 

single  tail  sum.  If  M  =  0  in  either  case  then  ?o  is  the  total  probability  for  that  tail. 

To  illustrate,  we  have 

?0  =  C/al  d\  b\  d 

and  for  6  <  1  we  form  the  successive  terms 

P_,R..p  ad  (a-l)(d-l)        R     _R        (a  +  l-M)(d  +  l-M) 

(A3) 
M 
and  the  single  tail  probability  is  Pq  ^i?,- . 

i'=0 

On  the  other  hand,  if  6  >  1  the  R's  are  formed  differently.  That  is 

d_1r   ..  R           be                       (fe-.i)(c-i)  (b+l-M)(c  +  l-M) 

^-^-^(a  +  W  +  lY^-^ia  +  W  +  l) RM  =  RM-l       {a  +  M){d  +  M) 

(A.4) 
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To  manage  the  opposite  tail  let  us  redefine  the  R's  in  the  following  way.  For 
the  case  6  <  1  we  change  to  M  =  min(b,  c)  and  choose 

R  -  bc  R    -R    (^jX£zj)         r     -r        (b  +  l-M)(c  +  l-M) 

which  matches  (A.4)  except  that  Rq  =  1  is  not  in  the  set.  The  opposite  tail 
probability  is  obtained  by  summing 

P0X#i     forallK,<l.  (A.6) 

The  opposite  tail  for  the  case  d>\  is  managed  similarly.  This  time 
M  =  mm(a,  d)  and  define  a  new  set  of  R's  according  to  the  form  of  (A. 3),  but 
omitting  Rq  =  1.  Then  apply  the  formula  (A.6). 
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APPENDIX  B 

The  estimated  coefficients,  their  standard  errors  and  p-values  for  the  model 
[AD]  [ABC]  [BCD]  are  given  in  Table  Bl.  The  coefficients  are  constrained  so  that 
one  level  of  each  factor  has  a  coefficient  that  is  set  to  zero.  For  example,  for  factor 
A  there  is  only  one  estimated  coefficient  X\  corresponding  to  success  at  OCS;  the 
coefficient  corresponding  to  failure  in  OCS  A  2  lS  se^  to  zero.  Thus,  the  estimated 
value  .9438  is  a  contrast  and  the  t-value  19.45  tests  the  null  hypothesis  that  the 
main  effects  for  levels  1  and  2  of  factor  A  are  the  same.  Since  A  has  only  2  levels 
this  is  equivalent  to  H0:Xf  =  X^  =  0.  The  main  effects  in  Table  Bl  are  labeled  as 
follows: 


A 

V 

(Success  in  OCS) 

B 

Af 

(Waiver) 

CI 

$ 

(FY89) 

C2 

1 

(FY90) 

C3 

A4 

(FY91) 

C4 

aS 

(FY92) 

Dl 

i 

(Hispanic) 

D2 

A4 

(Other) 

D3 

tf 

(Caucasian) 

effects  are  set  to  zero.  Interaction  terms  are  simil; 

Table  Bl 

Value 

Std.  Error 

t-value 

(Intercept) 

2.855 

0.147 

19.45 

A 

0.944 

0.102 

9.24 

Dl 

-0.135 

0.198 

-0.68 

D2 

0.481 

0.180 

2.67 

D3 

2.640 

0.144 

18.35 

B 

-1.122 

0.297 

-3.77 

CI 

0.151 

0.183 

0.83 

C2 

0.910 

0.165 

5.50 

C3 

0.820 

0.173 

4.73 

C4 

1.087 

0.163 

6.68 
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A:D1 

0.225 

0.116 

1.94 

A:D2 

0.304 

0.121 

2.51 

A:D3 

0.576 

0.085 

6.80 

A:B 

-0.085 

0.199 

-0.43 

A:C1 

0.036 

0.085 

0.42 

A:C2 

-0.278 

0.083 

-3.36 

A:C3 

-0.638 

0.083 

-7.70 

A:C4 

-0.438 

0.080 

-5.48 

B:C1 

0.888 

0.364 

2.44 

B:C2 

0.182 

0.358 

0.51 

B:C3 

0.300 

0.365 

0.82 

B:C4 

0.633 

0.342 

1.85 

B:D1 

-0.264 

0.389 

-0.68 

B:D2 

-1.084 

0.392 

-2.76 

B:D3 

-1.217 

0.280 

-4.35 

C:D1 

0.288 

0.235 

1.23 

C:D2 

-0.025 

0.211 

-0.12 

C:D3 

0.133 

0.176 

0.76 

CD4 

-0.101 

0.220 

-0.46 

C:D5 

-0.546 

0.197 

-2.77 

C:D6 

-0.513 

0.160 

-3.22 

C:D7 

0.220 

0.227 

0.97 

C:D8 

-1.057 

0.226 

-4.68 

CD9 

-0.269 

0.169 

-1.59 

CD10 

0.119 

0.214 

0.56 

C:D11 

-1.080 

0.206 

-5.25 

CD12 

-0.422 

0.158 

-2.68 

A:B:C1 

-0.083 

0.252 

-0.33 

A:B:C2 

-0.031 

0.254 

-0.12 

A:B:C3 

0.210 

0.266 

0.79 

A:B:C4 

-0.306 

0.249 

-1.23 

B:C:D1 

-0.865 

0.487 

-1.78 

B:C:D2 

-0.075 

0.472 

-0.16 

B:C:D3 

-0.752 

0.493 

-1.52 

B:C:D4 

-0.648 

0.462 

-1.40 

B:C:D5 

-0.248 

0.480 

-0.52 

B:C:D6 

-0.245 

0.512 

-0.48 

B:C:D7 

-0.710 

0.635 

-1.12 

B:C:D8 

-1.308 

0.661 

-1.98 

B:C:D9 

-0.792 

0.343 

-2.31 

B:C:D10 

-0.220 

0.341 

-0.65 

B:C:D11 

-0.740 

0.351 

-2.11 

B:C:D12 

-0.806 

0.332 

-2.43 
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Table  B2  contains  the  fitted  cell  means  along  with  the  standardized  residuals. 
The  standardized  residuals  are  plotted  against  the  fitted  values  in  Figure  Bl. 


Table  B2 


count 


Fitted 
Values 


Std. 
Residuals 


1 

100 

FY88 

Cauc. 

98.540 

0.147 

2 

11 

Black 

13.347 

-0.663 

3 

10 

Hisp. 

11.208 

-0.368 

4 

12 

Other 

9.905 

0.644 

5 

142 

FY89 

Cauc. 

37.662 

0.368 

6 

37 

Black 

36.016 

0.163 

7 

12 

Hisp. 

16.980 

-1.276 

8 

20 

Other 

20.342 

-0.076 

9 

102 

FY90 

Cauc. 

103.464 

-0.144 

10 

30 

Black 

29.175 

0.152 

11 

20 

Hisp. 

20.539 

-0.119 

12 

11 

Other 

9.822 

0.369 

13 

77 

FY91 

Cauc. 

71.784 

0.608 

14 

22 

Black 

26.670 

-0.933 

15 

14 

Hisp. 

13.166 

0.227 

16 

2 

Other 

3.380 

-0.813 

17 

70 

FY92 

Cauc. 

76.628 

-0.768 

18 

36 

Black 

35.432 

0.095 

19 

22 

Hisp. 

17.527 

1.027 

20 

4 

Other 

2.414 

0.932 

21 

1113 

FY88 

Cauc. 

1112.644 

0.011 

22 

48 

Black 

44.632 

0.498 

23 

48 

Hisp. 

48.823 

-0.118 

24 

95 

Other 

97.901 

-0.295 

25 

1533 

FY89 

Cauc. 

1532.608 

0.010 

26 

56 

Black 

53.801 

0.298 

27 

80 

Hisp. 

78.468 

0.172 

28 

111 

Other 

115.123 

-0.387 

29 

1263 

FY90 

Cauc. 

1251.554 

0.323 

30 

77 

Black 

83.893 

-0.763 

31 

76 

Hisp. 

82.953 

-0.774 

32 

109 

Other 

106.601 

0.232 

33 

1013 

FY91 

Cauc. 

1020.580 

-0.238 

34 

58 

Black 

53.558 

0.599 

35 

78 

Hisp. 

73.037 

0.574 

36 

39 

Other 

40.826 

-0.288 

37 


1390       FY92       Cauc. 


1397.537 


-0.202 


22 


38 

87 

Black 

85.477 

0.164 

39 

108 

Hisp. 

105.300 

0.262 

40 

67 

Other 

63.687 

0.412 

41 

22 

FY88 

Cauc. 

23.460 

-0.305 

42 

8 

Black 

5.653 

0.928 

43 

5 

Hisp. 

3.792 

0.591 

44 

1 

Other 

3.095 

-1.389 

45 

30 

FY89 

Cauc. 

34.338 

-0.757 

46 

15 

Black 

15.984 

-0.249 

47 

11 

Hisp. 

6.020 

1.817 

48 

7 

Other 

6.658 

0.131 

49 

35 

FY90 

Cauc. 

33.536 

0.251 

50 

16 

Black 

16.825 

-0.203 

51 

10 

Hisp. 

9.461 

0.174 

52 

3 

Other 

4.178 

-0.607 

53 

21 

FY91 

Cauc. 

26.216 

-1.056 

54 

22 

Black 

17.330 

1.076 

55 

6 

Hisp. 

6.834 

-0.326 

56 

3 

Other 

1.620 

0.968 

57 

45 

FY92 

Cauc. 

38.372 

1.041 

58 

31 

Black 
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APPENDIX  C 

Short  analysis  of  the  model  [ACD]  [BCD]. 

This  model  features  the  conditional  independence  of  factors  A  and  B  given 
the  levels  of  C  and  D,  coupled  with  a  fully  saturated  modeling  of  the  joint 
distribution  of  C  and  D.  Thus  the  loglinear  representation  can  be  made  more 
succinct  than  the  direct  representation.  Since 

Pij\kt=Pi\ktPj\kt> 
the  maximum  likelihood  estimates  of  the  two  factors  on  the  right  hand  side  are 

B±iL     and     HHL 

n++kt  n++kt 

respectively.  It  follows  that  for  each  k,  I  pair,  the  loglinear  model  of  the  left  hand 
side  may  be  expressed  as 

ln(™i/iw)  =  const  +  *$kl  +  tf\k£ 
and  estimates  of  these  parameters  can  be  obtained  rather  easily  from  the  twenty 
2x2  tables  that  lie  behind  Table  3.  The  maximum  likelihood  estimators  of  m^  \  m 
are 

ni+kt  n+jkl  I  n++kt 
and  match  the  expected  frequencies  in  the  2x2  contingency  table  computations. 
Next,  the  model  calls  for  the  saturated  version  of  pkt,  so  that 

In  m++ke  =  [i  +  Xck  +  A?  +  Xff 
with  the  customary  constraints.  The  maximum  likelihood  estimators  are 

™++kl  =  n++kl 
and  it  follows  from  the  rules  of  conditional  and  marginal  expectation 

mijkt  =  mkl  Pij\kl 
lead  to  the  estimates 

mijkt  =  mi+kt  m+jkt  I  m++kt- 
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This  point  is  especially  convenient  in  that  it  allows  chi  squared  test  statistics 
for  the  model  [ACD]  [BCD]  to  be  constructed  merely  by  summing  the  individual 
chi  squared  statistics  computed  from  the  original  twenty  contingency  tables.  The 
degrees  of  freedom  for  this  sum  are  the  total  of  the  individual  table  degrees  of 
freedom. 
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APPENDIX  D 

This  appendix  contains  the  details  of  a  logistic  regression  model  that  treats 
the  response  variable  of  whether  an  individual  possesses  a  waiver  or  not  and 
using  explanatory  variables  of  years  and  race.  Following  the  notation  established 
in  the  paper,  let 


Pke=^liL 


for  k  =  l,...,5 
for  1  =  1,...  ,4, 


m++kt 

be  the  probability  that  an  individual  of  race  I  in  year  k  possesses  a  waiver. 
Because  years  is  an  ordinal  variable,  it  is  treated  as  numeric  with  FY88, ...,  FY92 
scored  as  1, ...,  5  respectively.  The  logistic  regression  models  fit  to  ln(P^/(l-Pjt/)) 
along  with  the  likelihood  ratio  test  statistic  G2  and  the  corresponding  p-values 
are  as  follows: 


Model 


1.    /z  +  ACfc  +  A? 

3.  /z  +  Affc  +  A^  +  A^+A? 

4.  ^  +  X^k  +  ^k2  +  X^k3  +  Xc4k4+Xf 


G2 

degrees 

of 
freedom 

p-value 

25.11 

15 

.048 

23.43 

14 

.054 

19.36 

13 

.112 

18.95 

12 

.090 

The  fits  of  models  number  1  and  2  are  inadequate.  This  is  confirmed  in  Figures 
Dl  and  D2  where  the  standardized  residuals  are  plotted  against  years.  The 
pattern  of  the  residuals  in  both  figures  suggests  that  higher  order  polynomials  in 
k  need  to  be  fit  to  the  data.  The  model  3  fit  is  acceptable  and  the  residuals  (Figure 
D3)  appear  to  be  evenly  scattered  when  plotted  against  years.  The  hypothesis  test 
between  models  3  and  4  has  likelihood  ratio  test  statistic  19.36-18.95  with  1 
degree  of  freedom  and  p-value  .52.  Note  that  model  4  is  equivalent  to  fitting  the 
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logistic  regression  model  where  both  race  and  years  are  treated  as  categorical 
variables. 

The  standardized  residuals  and  fitted  values  given  in  Table  Dl  are  plotted  in 
Figure  D4. 

Table  Dl 


Race 

Year 

Fitted  Pki 

Standardized 
Residuals 

Cauc. 

FY88 

0.0775 

0.7799 

Cauc. 

FY89 

0.0876 

-0.4536 

Cauc. 

FY90 

0.0760 

0.3658 

Cauc. 

FY91 

0.0627 

0.1402 

Cauc. 

FY92 

0.0618 

-0.7288 

Black 

FY88 

0.3361 

-1.9954 

Black 

FY89 

0.3665 

1.0672 

Black 

FY90 

0.3311 

-1.8583 

Black 

FY91 

0.2873 

0.8670 

Black 

FY92 

0.2841 

1.3853 

Hisp. 

FY88 

0.1882 

0.0378 

Hisp. 

FY89 

0.2094 

-0.7101 

Hisp. 

FY90 

0.1848 

0.5946 

Hisp. 

FY91 

0.1558 

-0.2835 

Hisp. 

FY92 

0.1537 

0.2836 

Other 

FY88 

0.1010 

-0.2953 

Other 

FY89 

0.1138 

1.6707 

Other 

FY90 

0.0990 

-0.5201 

Other 

FY91 

0.0821 

-0.2613 

Other 

FY92 

0.0809 

-1.5438 

The  coefficients  X®  I  =  1,  ...,5  corresponding  to  the  factor  race  are  over- 
parameterized  without  an  additional  constraint.  Familiar  constraints  are  the 
"sum  to  zero"  and  the  "set  to  zero"  constraints.  Statistical  packages  usually  use 
either  one  of  these  constraints.  S-PLUS,  the  package  used  for  the  analysis 
presented  in  this  paper  uses  neither  of  these  constraints.  Instead,  let 
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xl  =  (l,k,k2,k3, 0,0,3) 

xl  =  (l,k,k2,k3,-l,-l,-l) 

xl  =  {l,k,k2,k3,l,-l,-l) 

xl  =  [l,k,k2,k3,0,2,-l). 
Then  for  example  in  model  3,  the  fitted  values  Py  can  be  found  by 


In 


r  p    \ 

K^-PklJ 


=  xjp 


where  ft    =[Pi,P2>--'Pt)  (along  with  estimated  standard  deviations  and 
t-values)  are  given  in  Table  D2. 


Table  D2 


i 

ft 

std  error 

t-values 

1 

(Intercept) 

-2.3634 

.3800 

-6.219 

2 

U<j) 

1.0065 

.4822 

2.0687 

3 

Uc2) 

-.3842 

.1791 

-2.145 

4 

(AS) 

.0399 

.0198 

2.012 

5 

-.3906 

.0647 

-6.034 

6 

-.3717 

.0496 

-7.501 

7 

-.2583 

.0186 

-13.9202 

S-PLUS  uses  helmert  polynomials  to  generate  the  linear  combinations  of  the 
parameters  used  for  each  level  of  a  categorical  factor. 
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Figure  D4 

Standardized  Residuals  vs.  Fitted  Probabilities 
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